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Abstra ct 

An investigation is conducted into the viability of 
using a generalized Conjugate Gradient-like method 
as an iterative solver to obtain steady-state solu- 
tions of very low-speed fluid flow problems. Low- 
speed flow at Mach 0.1 over a backward-facing step 
is chosen as a representative test problem. The un- 
steady form of the two-dimensional, compressible 
Navier-Stokes equations is integrated in time using 
discrete time-steps. The Navier-Stokes equations 
are cast in an implicit, upwind finite- volume, flux 
split formulation. The new iterative solver is used 
to solve a linear systems of equations at each step 
of the time-integration. Preconditioning techniques 
are used with the new solver to enhance the stability 
and convergence rate of the solver, and are found 
to be critical to the overall success of the solver. 
A study of various preconditioners reveals that a 
preconditioner based on the Lower-Upper Succes- 
sive Symmetric Over- Relaxation iterative scheme is 
more efficient than a preconditioner based on In- 
complete L-U factorizations of the iteration matrix. 
The performance of the new preconditioned solver 
is compared with a conventional Line Gauss-Seidel 
Relaxation (LGSR) solver. Overall speed-up factors 
of 28 (in terms of global time-steps required to con- 
verge to a steady-state solution) and 20 (in terms of 
total CPU time on one processor of a CRAY-YMP) 
are found in favor of the new preconditioned solver, 
when compared with the LGSR solver. 

Introduction 

Conventional iterative solvers like Line Gauss- 
Seidel Relaxation 1 (LGSR) and the Approximate 
Factorization 2 (AF) scheme have enjoyed consider- 
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able popularity as tools for solving the large sys- 
tems of simultaneous, linear equations that appear 
at each time-step of an implicit time-integration 
scheme. However, these conventional solvers exhibit 
several documented weaknesses 3 , namely, (i) con- 
vergence of LGSR depends on the choice of the 
relaxation parameter (under-relaxation is often re- 
quired for stability), (ii) LGSR and AF encounter 
a maximum time-step restriction (even in an im- 
plicit formulation) in complex flow problems, and, 
(iii) convergence of AF is sensitive to the time-step 
employed in the time-integration. These weaknesses 
limit the applicability of these solvers, and they aTe 
particularly inefficient when applied to ‘stiff* prob- 
lems (e. g. low-speed flows). 

The past few years have seen a resurgence of 
interest in using other methods like the Precondi- 
tioned Conjugate-Gradient Method 4 (PCGM), and 
its generalizations, as iterative solvers. The earli- 
est instances of the application of PCGMs to fluid 
flow problems are found in the works of Wong and 
Hafez 5 . The work in reference 5 used PCGMs to 
solve the potential flow equations for calculations 
of flow over transonic airfoils. Wigton et. al 6 used 
the GMRES 7 algorithm to improve the robustness 
and convergence of existing CFD codes. Reference 6 
used GMRES to obtain efficient potential and Euler 
solutions for subsonic and transonic flow over air- 
foils. The work of Venkatakrishnan 8 has provided 
positive evidence of the viability of preconditioned 
GMRES for the compressible Navier-Stokes equa- 
tions. Reference 8 used PCGMs to obtain viscous 
solutions for subsonic and transonic flow over air- 
foils. 

Inspite of the efforts mentioned above, the fam- 
ily of PCGMs has not attracted the full attention of 
researchers working in the area of algorithm devel- 
opment for linear system solvers, (i) The poweT of 
generalized CGMs (like GMRES) has not been har- 
nessed to develop a common computational tool for 
low-speed (or incompressible) and compressible flow 
— the regime of low-speed flows is virtually unex- 
plored. (ii) The issues of how and when to terminate 
the preconditioned GMRES solver need to be exam- 
ined. The termination criteria determines the num- 
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ber of sub-iterates at each global iteration (or time- 
step) , and directly affects the storage requirements 
and computational efficiency of the solver, (iii) The 
performance of preconditioned GMRES (and other 
related algorithms) needs to be documented against 
the performance of conventional solvers for upwind 
schemes, particularly for incompressible flow prob- 
lems. None of these issues has been examined in 
detail in the above mentioned references 5-8, or else- 
where in the literature. 

This paper has endeavored to examine the is- 
sues listed in the above paragraph. An attempt has 
thus been made to fill a gap in the existing literature 
by conducting an investigation of the applicability of 
preconditioned generalized Conjugate Gradient like 
methods, for solving flow problems in the low-speed 
regime. This paper also serves to complement the 
earlier work of the authors 9 , where the regime of in- 
terest was transonic and hypersonic flow problems. 
The success of preconditioned CGMs was clearly 
demonstrated in reference 9 for compressible flow 
problems. This encouraged the authors to investi- 
gate the possibility of using preconditioned CGMs 
for solving incompressible flow problems, with the 
goal of developing a universal code for incompress- 
ible and compressible flows. 

A full and more complete description of the the- 
ory will follow this Introduction in the next section. 
The governing equations of fluid flow, the Conjugate 
Gradient Methods, and the details of their imple- 
mentation in this research will be presented. The 
importance of preconditioning, and some possible 
preconditioners will be discussed. The results of 
the application of the new solver to a low-speed 
flow problems will be presented and compared to 
results using a conventional line relaxation algo- 
rithm (i.e., LGSR) on the same test problem. The 
results demonstrate the effectiveness of using pre- 
conditioned, generalized CGMs as iterative solvers. 
The final section enumerates the conclusions drawn 
from this work. 

P resentation of Th eory 


Na vier-Stok es Equ ations 

The governing equations of compressible fluid 
flow in 2-D are the Navier-Stokes equations written 
as 


dQ 8F 8G = dGy 

dt dx dy dx dy 

In this research, the thin-layer form of the above 
equations is used, i.e. all viscous terms which in- 
volve a gradient in the streamwise direction are ne- 
glected. The use of the thin-layeT equations is jus- 
tified because the coarseness of the computational 
mesh used in this research precludes the resolution 
of the viscous fluxes in the streamwise direction. 

The thin-layer equations are transformed from 
cartesian coordinate form (x,y) to generalized (£, 17 ) 


coordinates, which results in 

! 8Q dF dG _ dG v 
J dt + + dj] dr] 


where 


Q = [p, pu, pv , pe 0 ] T ; J -CxVy- tyVx 



; G=^ + 

J J 

J 

Gv = Gfe) 

[9vn 9vi J 9v y > 9v 4 
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tion. In addition, 
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The specific heat ratio, 7 , is taken to be 1.4. 
The molecular viscosity is given by p, a is the speed 
of sound and Rez, is the Reynolds number per unit 
length. Nondimension alization is with respect to 
the freestream density and velocity. The physical 
coordinates (x,y) and viscosity are nondimension- 
alized by a reference length L and the molecular 
viscosity of the freestream, respectively. It must be 
remarked that, in this research, no correction has 
been made to the compressible flow equations to 
account for the incompressible effects of low-speed 
flow. This is expected to affect the overall perfor- 
mance of the code (i. e. slower convergence rates for 
all solvers). However, the quality of the computed 
solution is not affected, and this is borne out by the 
excellent comparisons obtained with experimental 
data. In short, the compressible flow equations are 
used ‘as is’ for low-speed flow, and this is consis- 
tent with the goal of developing a common compu- 
tational tool for all regimes of flow (incompressible 
and compressible). 

The governing equations are solved computa- 
tionally in their integral, conservation law form, us- 
ing a cell-centered finite volume formulation. In- 
viscid flux terms are upwinded using Van Leer’s 10 
flux-splitting scheme. The thin-layer viscous fluxes 
are evaluated with second order accurate central dif- 
ferences. 
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Equation 2 can be rewritten in compact form 
as 


18Q 
J di 


= -R 


( 3 ) 


where the right-hand-side vector is written as 


„ dF dG dG v 


R is called the residual, and equals to zero for a 
steady state solution. The Euler implicit discretiza- 
tion of equation 3 in time gives 


AQ n = _ R n+l 

JAi 


( 4 ) 


where A Q n is the incremental change in the cell- 
centered values of the vector Q between the n - b 1 th 
time level and the known n th time level, i.e. 


A Q n = Q n+l - Q n (5) 

R n + l is linearized in time about the n th time level 
which results in 

( 1 + AQ" = -iT (6) 

\JAt dQ) * w 

where ///At is a block-diagonal matrix and is 
a large, sparse, block, banded matrix. 

Equation 6 can be rewritten as 

V n AQ n = -R n (7) 


The classical Conjugate Gradient Method (CGM) 
was proposed by Hestenes and Steifel 4 , to solve the 
system Ax = 6, where A is a symmetric and positive 
definite (SPD) matrix. The idea of using CGMs as 
iterative methods was first discussed by Reid 11 . 

The condition that A has to be an SPD ma- 
trix is seemingly restrictive because it is difficult to 
guarantee an SPD matrix for general fluid flow prob- 
lems which incorporate the Navier-Stokes equations. 
However, the method can be generalized for solving 
linear systems where the coefficient matrix is not 
symmetric and/or positive definite. Several gener- 
alizations have been proposed in the literature, par- 
ticularly for non-symmetric systems, by Saad and 
Schultz 7 , Young and Jea 13 , and Axelsson 13 , to name 
a few. The particular generalization used in this 
paper is the Generalized Minimal RESidual (GM- 
RES) method of Saad and Schultz 7 . As the name 
of the method suggests, GMRES seeks to mini- 
mize the norm of the computed residual vectoT, r n , 
(r n = b — Ax n ) at each iteration. 

The GMRES method is directly applicable to 
solve linear systems with non-symmetric coefficient 
matrices. The generalization is effected by a two 
step procedure — the generation of a set of or- 
thonoimal vectors from a given initial guess using 
ArnoldPs Method 14 , and the solution of a minimiza- 
tion problem. Arnoldi’s Method 14 uses the well 
known Gram-Schmidt algorithm for computing an 
orthonormal basis of vectors for the Krylov subspace 
A"(A, vi t k)= span{tq, Arq, . . . , A fc_1 ui}. 

It is possible to build linear system solvers for 
sparse matrices, with the help of the Arnoldi pro- 
cess. In order to solve the linear system Ax = b , we 
calculate an approximate solution x k of the form 
Xk = Xq + z*, where xo is some arbitrary initial 
guess to the exact solution x (= A~ l b). The vector 
Zk lies in the Krylov subspace collectively defined 
by A, r 0 (— b - Axq) and i.e., 


Equation 7 represents the system of linear simul- 
taneous algebraic equations that has to be solved 
at each global time-step in the computation. This 
system of equations can be solved by direct ma- 
trix inversion; however, this requires extensive com- 
puter memory and computation effort at each time 
step. Iterative schemes are attractive because of 
their computational efficiency and relatively mea- 
ger memory requirements. In this paper, a compar- 
ison is made between the efficiency of two iterative 
solvers — a conventional Line Gauss-Seidel Relax- 
ation (LGSR) solver and a preconditioned general- 
ized conjugate-gradient type solver. Results of com- 
parisons with the Approximate Factorization (AF) 
solver may be found in reference 3, but are not pre- 
sented in this paper. 

Generalized C onjugate Gradient Meth ods 

Recall, that we are interested in solving the lin- 
ear system of equations 

V n AQ n = -R n O Ax — b (8) 


Zfc G K(A, r 0 , k) - span{r 0) At 0 , ...,A k 
= span{r 0 ,n, .. .,r fc } 



These methods are referred to as Krylov subspace 
methods. 

The iterative scheme based on Krylov subspace 
methods will be most successful when the iterate x k 
minimizes the corresponding residual norm, ||r*||. 
Mathematically, this is equivalent to a minimization 
of |[r fc || over z k £ K(A 1 r 0> k) ) i.e., 

min||rjfe|| O min |[(6 - Ax*)|| (10) 

Zk Zk 


The solution of this minimization problem forms the 
second part of the GMRES algorithm. 

The complete GMRES algorithm can be sum- 
marized as follows: 

1. For any starting vector Xo, form the initial vec- 
tor r 0 = 6 - Ax 0 ; (3 = ||r 0 || 2 ; tq = r 0 /(3 . 

2. Generate the basis of orthonormal vectors de- 
fined in equation 9 
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3. Form the approximate solution: 

a) Find the vector z* which solves the mini- 
mization problem of equation 10 

b) Compute xjt — + z* 

In summary, the GMRES method is a mini- 
mization process to solve linear matrix systems like 
Ax = b, The minimization proceeds as a sequence 
of k sub-iterations, and the minimizer is obtained 
by a simple upper-triangular solve in step 3 of the 
algorithm. One major practical difficulty with GM- 
RES is that as k increases, both storage and oper- 
ation cost increase as O(k) and 0(k 2 ), respectively. 
Hence, the number of sub-iterations has to be re- 
stricted to minimize the cost of each global iteration. 
This issue, which has not received mnch attention 
in the literature, is addressed later in this paper. 

Further mathematical details, implementation 
techniques and convergence analyses of the method 
are contained in references 3 and 8. Since this pa- 
per is also concerned with convergence acceleration, 
it may be remarked that the speed of convergence 
of the algorithm depends on the condition number 
of the matrix A (« 2 (^)), and the distribution of 
singular- values of A (i.e. the spectrum of A). tt 2 (A) 
is defined as the ratio of the maximum to minimum 
singular- value of the matrix A. If k 2 (A) is large 
and/or the spectrum of singular- values of A is wide 
and scattered, A is said to be poorly conditioned, 
and convergence may be very slow. Precondition- 
ing is employed to improve the conditioning of the 
coefficient matrix. The preconditioning technique 
chosen may hence be critical to the success of any 
generalized CGM used to solve linear systems of 
equations. 

Preconditioning Techniques for CGMs 

One of the most effective iterative methods for 
solving large, sparse linear systems of equations is 
a combination of a generalized Conjugate Gradient 
like procedure with some appropriate precondition- 
ing technique. Assuming that a preconditioning ma- 
trix M is used on the left of the original unprecon- 
ditioned system, this involves solving the precondi- 
tioned linear system 

M~ l Ax — M~ l b & Ax — b (11) 

instead of the original system Ax ~ b. 

The motivation for preconditioning is twofold 
— to reduce the computational effort required to 
solve the linearized system of equations at each 
time-step, and, to reduce the total number of time- 
steps (or global iterations) required to obtain a 
steady-state solution. Preconditioning will be cost- 
effective only if the additional computational work 
incurred for each sub-iteration is compensated for 
by a reduction in the total number of iterations to 
convergence — so that the total cost of solving the 
overall non-linear system is reduced. 

The costs associated with preconditioning can 
be enumerated as (i) Computing the precondition- 
ing matrix M } (ii) Matrix-vector multiplies ot equiv- 
alent linear system solves associated with M, and, 


(iii) Additional computer storage to store M — 
which may be of the order of storage requirements 
for the coefficient matrix A . The selection of an ‘effi- 
cient’ preconditioner is motivated by the minimiza- 
tion of the afore-mentioned costs, and is considered 
crucial to the success of the preconditioned GMRES 
algorithm. 

The cheapest preconditioner is the identity ma- 
trix (of appropriate rank^. However, for M ~ /, 
the original, unpreconditioned iterative scheme is 
recovered! The costliest preconditioner results when 
M — A. This is equivalent to performing a direct 
solver approach and inverting A to obtain x — A~ l b 
— an approach which is unacceptable for the solu- 
tion of large, sparse linear systems of equations. It is 
apparent that a practical preconditioner lies some- 
where between the two extremal choices of M I 
and M — A. It is also evident that M should be in 
some sense “close” to A — so that M~ l A is close to 
the identity matrix. For effective preconditioning, 

this means that the eigenstructure of A should be 
close to that of the identity matrix /, i.e. A should 
be well-conditioned. 

For practical implementations, the ‘explicit’ 
computation of M 1 is not advisable because (i) 
Even though M may be sparse (corresponding to 
the sparsity of A), M~ l may be a dense matrix. 
Storage requirements for M ~ l may thus far exceed 
those for M , and, (iij If M is ill-conditioned (corre- 
sponding to ill-conditioning of A), computation of 
M~ l in computer arithmetic will be highly error- 
prone. 

In practice, ‘explicit’ preconditioning is re- 
placed by an equivalent but highly effective tech- 
nique called ‘implicit’ preconditioning. Implicit pre- 
conditioning transforms the ‘explicit’ problem of 
forming matrix-vector products of the type M~ l u 
(= u) to an equivalent ‘implicit’ problem of solv- 
ing a linear system for u 9 with M as the coefficient 
matrix. This can be written as 

M~ l u — u Mu — u (12) 

This linear system has to be solved for each sub- 
iteration of the preconditioned GMRES algorithm. 
Thus, any matrix M which produces easy-to-solve 
(i.e., computationally efficient) linear systems of the 
type Mu — u, is a potentially acceptable precondi- 
tioner. 

In the light of the preceding discussions, a good 
preconditioning matrix M should 

i) produce an iteration matrix A which is better 

conditioned than A, and 

ii) produce easy-to-solve linear systems of the form 

Mu — u. 

On the basis of the above criteria, several pre- 
conditioners that can be derived from the coefficient 
matrix A are diagonal, block-diagonal, incomplete 
L-U factorization flLUF) and block-ILUF 15 — in 
increasing order of computational cost. Iterative 
methods used in existing CFD codes can also be 
used as effective preconditioners 6 . Some examples 
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are Line Gauss-Seidel Relaxation (LGSR), spatial 
Approximate Factorization (AF), and variants of 
the LUSSOR scheme of Yoon and Jameson 16 . 

The choice of an effective, stable preconditioner 
is extremely important to the success of GMRES. 
For this work, several preconditioners were investi- 
gated for their effectiveness and stability. An “effec- 
tive” preconditioner is defined here as one that as- 
sists GMRES in obtaining rapid convergence, while 
requiring a minimal overhead cost for the precon- 
ditioning. A “stable” preconditioner is one that 
can be successfully computed from the matrix A 
(e. g. the computation of the ILUF of A may be 
unstable if A is severely ill-conditioned). Diago- 
nal preconditioners — both scalar and block ver- 
sions — were found to be ‘unstable* when applied 
to the test case in this paper. LGS relaxation and 
spatial AF, representing the use of existing flow 
solvers in CFD codes as preconditioners, were found 
to be ‘stable* preconditioners. They were how- 
ever not as ‘effective’ in accelerating convergence as 
the LUSSOR (Lower-Upper Symmetric Successive 
Over-Relaxation) or the BILUF (Block Incomplete 
Lower-Upper Factorization with zero fill-in) precon- 
ditioners. LUSSOR and BILUF were thus selected 
as primary preconditioners for use with GMRES 
in this research. The original LUSSOR scheme of 
reference 16 was modified, and then incorporated in 
a block version, for use as a preconditioner in this 
research. Complete details of LUSSOR and BILUF 
preconditioning may be found in reference 3. 

I mplementation o f Pre condi tioned G M RES 

In this paper, the preconditioned GMRES al- 
gorithm has been implemented for solving the sys- 
tem of equations represented by equation 8. The 
coefficient matrix A is non-symmetric, banded and 
sparse. The knowledge of the sparse, banded struc- 
ture of A is exploited by storing only the five (for a 
first-order left-hand-side implicit formulation) non- 
zero block diagonals of the matrix — each element 
in a diagonal being a 4*4 block matrix. All matrix- 
vector multiplications are performed using the algo- 
rithm of_Madsen et. al 17 , which uses only the diago- 
nals of the sparse matrix to effect the multiplication. 
This algorithm is used since it is consist en t with The 
storage scheme for the matrix A in this paper. 

As described earlier, GMRES solves the sys- 
tem of linear equations at each global iteration by 
performing a number of sub-iterations within each 
global iteration. For the overall efficiency of the 
preconditioned GMRES solver, the number of sub- 
iterations, fc, needs to be ‘limited’, at each time- 
step. One approach often adopted is to fix A; 6 ’ 8 , 
i.e., perform a predetermined (user-specified) num- 
ber of sub-iterations at each time step. A different 
approach, which is used in this research, is to as- 
sign a ‘stopping criteria’ based on the reduction in 
the norm of the initial residual vector of the linear 
system, i.e., to terminate the sub-iterations when 

1 |rjfc||/||ro|| < e ) where e < 1. In practice, this trans- 
ates to truncation of the orthogonalization process 


after k steps, in step 2 of the GMRES algorithm. 
The use of a ‘stopping criteria’ provides a flexible, 
rather than a fixed k. Thus, the issue is to choose an 
e for accuracy value) for each global iteration which 
will provide a globally stable and efficient iterative 
scheme. 

During this research, it was observed through 
numerical experimentation that e can be related to 
the Courant number (A) of the time-integration by 
the following relation: 


e - 0.5 

1 

~~ log 10 (A 2 ) 


0 < A < 10 
A > 10 


(13) 


The use of the above criteria for e provides a stable 
GMRES scheme for the low-speed test case in this 
research. It is also valid for both the precondition- 
ers (LUSSOR and Block ILUF) used with GMRES. 
It should be remarked that the use of equation 13 to 
specify s as the stopping criterion successfully limits 
k to values below 10, for values of A upto 1000 (as 
used in this research). This criteria was also suc- 
cessful when subsequently applied to the transonic 
and hypersonic flow test cases of reference 9. 

Some issues related to storage requirements are 
now discussed. Recall that the global implicit ma- 
trix has a block, banded structure with a known 
sparsity pattern. All results presented in this work 
involve a first-order accurate discretization of the 
implicit operator, thus creating an implicit matrix 
with five well-defined diagonals. Hence, for a prob- 
lem size of N (i.e., N grid points), storage corre- 
sponding to 5 *N blocks (each of size 4*4) is required 
for the implicit coefficient matrix. This storage is 
required for both the solvers (GMRES and LGSR) 
tested in this research. The use of precondition- 
ing in conjunction with the GMRES solver neces- 
sitates the storage of the corresponding precondi- 
tioning matrix, M. The additional storage depends 
on the particular preconditioner, and varies from N 
blocks (for block diagonal and LUSSOR precondi- 
tioning) to 5*A r blocks (for block ILU precondition- 
ing witn no fill-in). 

Recall, that each linear system solve with GM- 
RES requires k sub-iterations, which translates to 
k * N * 4 additional storage locations for the sub- 
iterate vectors. As k becomes large (for high 
Courant numbers and stiff problems), the storage 
cost associated with GMRES may become signifi- 
cant. In this research, it has been demonstrated 
that the use of equation 13 establishes an upper 
limit of k — 10. The corresponding storage of 
size 40 * N locations has been extracted from ex- 
isting ‘temporary’ storage in the code, i.e., storage 
is shared by GMRES and other subroutines in the 
code. Hence, the need for additional storage for the 
sub-iterate vectors has been successfully eliminated 
in the implementation of preconditioned GMRES in 
this research. 
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Test Results and Discussion 

The problem of computing low-speed flow over 
a backward-facing step was selected to demonstrate 
the effectiveness of preconditioned GMRES over a 
conventional Line Gauss Seidel Relaxation (LG SR) 
solver. This flow problem illustrates the phenom- 
ena of flow separation and recirculation in internal 
flows. Extensive experimental and theoretical in- 
vestigations have been performed for this flow by 
Armaly et. al 18 . 

All computations are done for laminar flow and 
have been performed using the optimum vector pro- 
cessing facilities on a single processor of a Cray- 
YMP/832. All flow variables are second-order ac- 
curate, fully- up winded in the £ direction, and third- 
order accurate, upwind-biased in the rj direction. 
All boundary conditions are first-order accurate. 
Conserved variables are used for interpolation of 
cell-centered values to cell-face values. The implicit 
fl eft-hand-side), global operator is discretized in a 
first-order accurate manner. This is done to assure 
stability of the LG SR solver, and to save on storage 
and computational costs at each global iteration. 
All boundary conditions are linearized consistently, 
and are included in the implicit coefficient matrix. 

The computation is started with freestream 
flow as the initial guess. The relaxation parame- 
ter (w) for LGSR is set equal to one. It must be 
remarked that w > 1 (i.e., over-relaxation) is not 
a stable choice for the LGSR solver for the partic- 
ular test case investigated here, and w < 1 (i.e., 
nnder-relaxation) can only serve to slow down the 
convergence rate. Alternate sweeps in both the hor- 
izontal and vertical directions are used for stability 
of the LGSR solver. 

Excellent qualitative comparisons with experi- 
mental data have been obtained for the test problem 
examined. However, it is not the intent of this work 
to present detailed comparisons of computational 
and experimental data. Detailed code validation re- 
sults have been presented for the code being used in 
this research in an earlier paper 19 . The results pre- 
sented in this work stress the relative performance 
of the solvers and preconditioners being examined, 
which is consistent with the overall goals of this pa- 
per. 

Performan ce Indices for Comp arisons of Solvers 

A standard and widely accepted approach to 
judge the performance of linear system solvers is to 
stuay the convergence rate provided by the particu- 
lar solver. The idea is to examine the rate of conver- 
gence of the global non-linear problem as a function 
of the number of global iterations (or time steps or 
solution updates). In this research, the compari- 
son of solvers is based on the number of iterations 
required to reduce the I 2 norm of the residual vec- 
tor (normalized by the norm of the initial residual 
vector) of the non-linear problem by ten orders-of- 
magnitude. 

It is often argued, and apparently correctly 
so, that a 2-3 ordei-of-magnitude reduction of the 


non-linear residual is sufficient to obtain solutions 
within the limits of engineering accuracy. Hence, 
there seems to be no practical justification in im- 
posing a seemingly strict criteria of a ten orders- 
of-magnitude reduction for the residual. However, 
there are several instances when this strict criteria 
is required, and often necessary. 

Firstly, if the initial guess to the solution is 
close to the true solution, then a greater (than 2- 
3 orders) reduction in the residual is necessary to 
provide the correct steady-state solution. Such in- 
stances arise when, say, a small perturbation (by 
changing the set of boundary conditions or the grid) 
is applied to a known solution (on a particular set of 
boundary conditions and grid). Secondly, very high 
residual reductions are often required when results 
from CFD codes are used as inputs to perform opti- 
mization of design parameters. In such cases, results 
to engineering accuracy are unacceptable since they 
may introduce large errors while evaluating sensi- 
tivities of critical design parameters. Thirdly, if the 
underlying non-linear problem is stiff, then a 2-3 or- 
ders residual reduction may not be sufficient, even 
to provide results to engineering accuracy. This 
has been clearly illustrated by the results of the 
backward-facing step test case in this research. 

One of the issues that can determine the suc- 
cess of a solver is the amount of computational or 
Central Processing Unit (CPU) time required by the 
solver to perform each global iteration. The CPU 
time of an algorithm can be heavily influenced by 
the skills of the individual programmer. Efficient 
implementations often require investments in ‘real' 
time and patience by the programmer. In addition, 
an implementation on one particular machine may 
run faster or slower on another machine, i.e., the 
CPU time may be machine dependent. Matters are 
further complicated by factors like compiler opti- 
mization, vectorization and parallelization of the al- 
gorithm. 

All the test results presented in this research 
were obtained with vector implementations of algo- 
rithms executed on a computer with vector process- 
ing facilities. Hence, the CPU time comparisons of 
the preconditioned GMRES and LGSR solvers (and 
the comparisons of preconditioners) do account for 
the intrinsic differences in their levels of vectoriza- 
tion. Parallel and distributed computing can also 
afford rapid reductions in the ‘turn-around’ time of 
an algorithm. This research has focussed on vec- 
torization issues in detail, and no efforts have been 
presently made to parallelize any of the solvers or 
preconditioners. It is however well accepted that 
Conjugate Gradient like algorithms can be success- 
fully adapted for use with parallel architectures 20 . 

It may be remarked that the number of iter- 
ations to convergence is completely independent of 
all the factors that influence the CPU time, and can 
thus be said to reflect the true convergence charac- 
teristics of a solver. The CPU time comparisons 
are useful, and in conjunction with the number of 
iterations comparisons, provide an estimate of the 
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practical utility of a solver. 


Details of N umerical Com p utations 

Armaly et. al 18 have presented detailed mea- 
surements of velocity distribution and reattachment 
length for the incompressible flow of air downstream 
of a single backward facing step in a 2-D chan- 
nel. The results show that the various flow regimes 
(in the Reynolds number range of 70< Re <8000), 
are characterized by typical variations of separation 
length with Reynolds number. The Reynolds num- 
ber is based on twice the height of the inlet channel, 
and two-thirds of the maximum inflow velocity at 
the step. The particular test cases chosen for this 
research corresponds to Re < 400, since the exper- 
imental data suggests that Re > 400 produces 3-D 
variations or turbulence in the flowfield. 

The numerical computations are performed on 
an H-mesh with 61 and 51 points in the £ (stream- 
wise) and 7 ] (normal) directions, respectively. The 
grid is shown in Figure 1. Grid points aTe clustered, 
both in the normal and streamwise directions, to 
resolve the various viscous gradients and the reat- 
tachment point of the separated flow. A freestream 
Mach number of M & = 0.1 is specified. Four differ- 
ent cases with Reynolds number of Re <*, = 100, 200, 
300 and 389 are computed. The thin-layer approxi- 
mation to the Navier-Stokes equations is used. The 
full Navier-Stokes terms are not included during the 
computations because the coarseness of the grid in 
the streamwise direction will prevent resolution of 
these terms. Moreover, as will be shown later, ex- 
tremely accurate physical results are obtained with 
the thin-layer Navier-Stokes terms. 

BACKWARD-FACING STEF 
61x51 GRID 






Figure 1. Computational Grid 

Adiabatic, no slip boundary conditions are used 
on the top and bottom walls forming the boundaries 
of the channel, and on the lower portion (which de- 
fines the step) of the inflow boundary. For fully 
developed subsonic flow at the outflow boundary, 
three variables (p, u and v) are extrapolated and 
the pressure is determined by fixing the stagnation 
enthalpy. It must be remarked that this particular 
outflow boundary condition performed considerably 
better than a fixed static pressure condition at the 
outflow. 

The computational code used in this research 
is designed to perform single-grid calculations only. 
Hence, it is not possible to simulate the entire exper- 
imental setup of reference 18 , as this would require a 
minimum of two separate grids. It was thus decided 
to omit the inlet channel (before the step) in the 
computations, and simulate its effect by imposing a 
fully-developed profile for laminar flow, at the step. 


This, in turn, created a considerable challenge for 
the specification of the inflow boundary condition 
at the step. 

According to characteristic wave theory, the 
proper specification of boundary conditions for sub- 
sonic inflow requires one physical boundary condi- 
tion and three numerical boundary conditions to be 
satisfied. Note that only one physical quantity can 
be fixed, and the complete specification of a fixed 
inflow (parabolic in this case) velocity profile re- 
quires both components of velocity (u and i>) to 
be fixed. Hence, it seems unlikely that a purely 
parabolic velocity profile, that will remain fixed as 
the time-integration proceeds, can be maintained at 
the inflow boundary. 

One approach often used in external flow cal- 
culations is to specify a variation of stagnation en- 
thalpy at the inflow, which in turn provides the de- 
sired velocity profile. This, however, did not prove 
to be a stable inflow boundary condition for this in- 
ternal flow case. Several other variations of the in- 
flow boundary condition were attempted, but none 
of them yielded satisfactory results. Some of these 
variations are listed below. Note, that the stagna- 
tion enthalpy and entropy are fixed, for all these 
variations. The subscripts & and , refer to boundary 
and interior values, respectively. 

i) Specify profile of Mach number ; = V{ 

iij Specify profile of u velocity ; v j, = v % 

iii) Specify profile of u velocity ; pf, — p % 

iv) Specify profile of total velocity ; r*, = i' x 

v ) vi — 0 ; ut — u t 

vi) v b = 0 ; p b — pi 

The problem is resolved by imposing a profile 
of Reimann invariants at the inflow boundary. The 
velocity profile obtained with this boundary con- 
dition does vary in time, but is sufficiently stable 
to simulate the incoming flow in an accurate man- 
ner. Since the Reimann invariants act as a non- 
reflecting boundary condition 21 , their use helps in 
rapidly eliminating the initial transients in the so- 
lution, as will be shown later by the convergence 
history results. 

Figure 2 shows Mach number contours obtained 
from the computations on the 61 + 51 grid for the 
Re = 389 case. The nature and size of the sep- 
aration and recirculation behind the step closely 
matches the physical description of the flow as ob- 
tained in the experiments of Armaly et. al 18 . Sim- 
ilar results were also obtained for lower Reynolds 
numbers of Re — 100, 200 and 300, 

MACH NUMBER 
RANGE=0.0, 0.I 
INTER VA1 .=0.003 



Figure 2. Mach Number Contours 

It should be remarked that the reattachment 
point is the primary flow feature used to charac- 
terize the flow in the experimental data 18 . For 
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Re — 389, the experimental results suggest a down- 
stream reattachment length (Xr) to step height (5) 
ratio of Xr/S = 7.9 . The numerical computation 
predicts Xr/S = 7.95. The corresponding com- 
puted values for Re — 100, 200, and 300 are 2.97, 
5.09 and 6.58, respectively. The experimental val- 
ues are 3.1, 5.2 and 6.7, respectively. A comparison 
of experimental and numerical reattachment lengths 
is presented in figure 3. It can be seen that the 
experimental and numerical reattachment lengths 
are close to each other. This represents particu- 
larly good numerical solutions, in light of the fact 
that onset of reattachment is more difficult to pre- 
dict than onset of separation. The reattachment 
point for each case is confirmed by inspecting the 
velocity profiles for each case. An example set of 
velocity profiles is shown in figure 4 (for Re — 389). 
Velocity profiles obtained from numerical solutions 
are compared with available experimental data in 
figure 5 (ite = 100) and figure 6 (Re = 389). The 
differences between the experimental and computed 
profiles can be attributed to the excessive numerical 
diffusion introduced by the Van-Leer flux-splitting 
scheme used in this research. Recent tests with 
the flux-splitting scheme of Liou and Steffen 22 show 
much better agreement with experimental data. 
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Figure 3. Reattachment Length vs. Reynolds No. 
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Figure 4. Velocity Vectors for Re = 389 


Comparis ons of Solvers and Preconditioners 

Figure 7 presents the convergence history com- 
parisons between GMRES with LUSSOR precondi- 
tioning (GMLUS), GMRES with BILUF precondi- 



Figure 6. Velocity Profiles for Re = 389 


tioning (GMILU) and the Line Gauss-Seidel Relax- 
ation (LGSR) soiver. The ‘logarithm of the I 2 norm 
of the residual’ has been plotted against the ‘number 
of global iterations’ (or time steps) . It can be clearly 
seen that both GMLUS and GMILU converge at a 
much faster rate than the LGSR solver. 

The ‘correct’ physical solution (i.e., the proper 
reattachment point) is obtained after a six order re- 
duction of the residual is achieved. The rapid reduc- 
tion in the initial residual (from zero to three orders) 
is a result of using the Reimann invariants at the in- 
flow boundary (as discussed earlier). This rapid re- 
duction represents elimination of part of the initial 
transient. The maximum (or asymptotic) Courant 
numbers (A) for GMLUS, G MILU an d LGSR are 
200, 200 and 10, respectively. The maximum A for 
LGSR is the value of A above which LGSR becomes 
unstable. The maximum A for GMLUS and GMILU 
is the value of A above which the number of sub- 
iterations exceeds 10. Curves A (for GMLUS) and B 
(for GMILU) are generated with an initial A = 100, 
with an increase to A = 200 after 100 global itera- 
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lions. The LGSR solver (Curve C) ‘permits’ an ini- 
tial Courant number of A = 10, and does not permit 
any increase in the Courant number (i.e. , the iter- 
ates diverge at higher values of A), even after the 
correct physical solution has been obtained. 

GMLUS and GMILU converge in approxi- 
mately 2000 global iterations. LGSR, however, 
reaches a six order reduction (i.e., the correct phys- 
ical solution) in 15,000 global iterations and is esti- 
mated to require about 56,000 iterations (according 
to convergence rate estimates) to attain a ten-order 
reduction (i.e., convergence). Hence, in terms of the 
number of overall global iterations to convergence, 
GMRES (with either preconditioner) is significantly 
(about 28 times) faster than the LGSR solver. 

The use of different values of A for the pre- 
conditioned GMRES and LGSR solvers may raise 
some question about the ‘fairness’ of the conver- 
gence rate comparisons, particularly since compar- 
atively higher values of A are used with precondi- 
tioned GMRES. The superior convergence rate of 
the preconditioned GMRES solver can be attributed 
to two factors — the higher “allowed” values of A, 
and the use of preconditioning. The role of pre- 
conditioning in accelerating convergence can be de- 
termined by using preconditioned GMRES at the 
same A as the LGSR solver — the result is that 
preconditioned GMRES continues to have a supe- 
rior convergence rate than the LGSR solver, al- 
though the speed-up factor is reduced,. In the ab- 
sence of any preconditioning, it was observed that 
the GMRES solver requires considerably more sub- 
iterations (than the upper limit often sub-iterations 
with preconditioned GMRES) to provide an overall 
stable iterative scheme (even at low Courant num- 
bers of order one). The unpreconditioned GMRES 
is thus, impractical as a solver at low Courant num- 
bers, and is unstable at high Courant numbers. 

It must be remarked that the use of precondi- 
tioning enables the GMRES solver to accept higher 
values of A than those “allowed” by the LGSR 
solver. It is hence only prudent to take advantage 
of the increased stability afforded by the use of pre- 


conditioning, and increase the convergence rate of 
the preconditioned GMRES solver by using large 
Courant numbers. The maximum speed-up is thus 
obtained by using the maximum allowable values 
of A. The preconditioned GMRES solver can ac- 
cept even higher values of the Courant number than 
those used in this research, but this requires an in- 
crease in the number of sub-iterations (and hence, 
increase in the storage and CPU time per time- 
step), without any appreciable increase in the over- 
all convergence rate. The choice of A is thus influ- 
enced by the need to obtain the best computational 
efficiency with the preconditioned GMRES solver, 
and by the need to limit the storage for the sub- 
iterates (as discussed earlier). 

The CPU time comparison of the solvers is 
shown in figure 8. The superior efficiency of GMLUS 
over GMILU and LGSR is seen in this comparison. 
GMLUS and GMILU require 750 and 3750 seconds, 
respectively, to converge to a steady-state solution. 
The reason for this difference in CPU times is that 
the Block Incomplete L-U Factorization (BILUF) 
preconditioner used in the GMILU solver requires 
(partially vectorized) computation of the ‘1/ and ‘U’ 
factors at each global iteration (set-up cost), and 
subsequent forward and backward (scalar) solves 
with the l L’ and *U’ factors at each sub-iteration. 
The block-LUSSOR preconditioner used in the GM- 
LUS solver requires virtually no set-up time, and in- 
volves two (partially vectorized) block-diagonal in- 
versions across the domain at each sub-iteration 3 . 
This huge difference in computational overhead is 
the reason why GMILU (using BILUF) requires al- 
most five times as much CPU time as GMLUS (us- 
ing LUSSOR). The LGSR solver is again extremely 
slow in terms of total CPU time, as it was in terms 
of number of iterations. LGSR is estimated to re- 
quire about 15,000 seconds of CPU time for conver- 
gence. Thus, LGSR requires about 20 times more 
CPU time than that required by GMLUS. 



0 ?G00 4000 

CPU TIMC (s) 

Figure 8. ‘CPU Time’ Comparison 

One of the goals of this research is to identify 
stable and efficient preconditioners for the GMRES 
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algorithm, for use with the Navier-Stokes equations. 
Note, that the ‘number of iterations’ comparison 
shows that both the LUSSOR and BILUF precon- 
ditioners are stable, and are equally effective in con- 
verging to the steady state. The CPU time compar- 
ison shows that the use of LUSSOR preconditioning 
can afford considerable gains in CPU time over the 
use of the BILUF preconditioner, while maintaining 
a competitive convergence rate. 

The CPU time comparisons reveal that the 
speed-up factors with the LUSSOR preconditioner 
are much higher than those for the BILUF pxecon- 
ditioner. The major reason for this difference may 
be the lack of vector inability of the BILUF precon- 
ditioner. The CPU time for the BILUF precondi- 
tioner can be reduced by reusing the ‘L* and ‘U’ 
factors over a (user fixed) number of global itera- 
tions. This translates to performing the incomplete 
LU decomposition only every t th iteration, and then 
reusing the incomplete LU for the next (t— 1) global 
iterations. This greatly reduces the overhead, cost of 
computing the BILUF preconditioner at each time- 
step, and makes BILUF more competitive in terms 
of CPU time. 

A thorough mathematical analysis of the com- 
peting solvers in this paper could provide an in- 
depth explanation of why one solver or precondi- 
tionei is better than another. Such an analysis 
would require a detailed eigenvalue analysis of each 
iterative scheme. Unfortunately, it is practical to 
do such an analysis only for small matrices, as may 
be encountered in 1-D problems or in 2-D problems 
with very coarse meshes, A coarse mesh for the test 
problems of this paper could be adopted for this 
purpose, but then there is no theory to provide a 
one-on-one correspondence between eigenvalues for 
a coarse and fine mesh. 

It is possible to obtain estimates of extremal 
(maximum and minimum) eigenvalues for the GM- 
RES solver 23 . However, the usefulness of such es- 
timates is limited and questionable since a) the es- 
timates may not be accurate, b) the estimates are 
not available when GMRES is preconditioned, and, 
c) extremal eigenvalues may provide an accurate 
condition number estimate, but more information 
(about the distribution of interior eigenvalues) is 
required to judge the effectiveness of any particu- 
lar pTeconditioner. Thus, even though a complete 
eigenvalue analysis is desirable to fully explain the 
differences in convergence rates of the solvers, it 
is impractical for the problem being tested in this 
research. The development of cheap and accurate 
methods to perform such eigenvalue analyses could 
serve as a good source of future work. 

Conclusions 

The applicability of the preconditioned GM- 
RES solver to low-speed (or incompressible) flows 
has been demonstrated by computing the incom- 
pressible flow over a backward facing step. The 
new solver performs with considerable success when 
compared to a conventional Line Gauss-Seidel Re- 


laxation solver. Remarkable speed-ups in conver- 
gence rate are afforded by the new solver. When 
the ‘number of iterations to convergence 5 is used as a 
comparison criterion, both the preconditioned GM- 
RES solvers (using LUSSOR and BILUF precon- 
ditioners, respectively) are approximately 28 times 
faster than the LGSR solver, for the backward- 
facing step test case. For ‘CPU time to convergence’ 
as a comparison criterion, GMRES with LUSSOR 
preconditioning is 20 times faster than the LGSR 
solver. 

This research establishes that the LUSSOR 
(Lower Upper Symmetric Successive Over Relax- 
ation) scheme of Jameson and Yoon 16 can he suc- 
cessfully used as a preconditioner for the GMRES 
solver, for incompressible flow applications. The 
modified, block LUSSOR scheme used in this re- 
search outperforms the more popular precondition- 
ers based on incomplete factorization(s) of the iter- 
ation matrix e.g. BILUF (Block Incomplete Lower- 
Upper Factorization with zero fill-in). This rein- 
forces the findings of reference 9 , where the superi- 
ority of the LUSSOR preconditioner was established 
for compressible (transonic and hypersonic) flows. 

The ability of the LUSSOR preconditioner to 
be as effective in accelerating convergence as the 
BILUF preconditioner is a significant finding related 
to the development of preconditioners. BILUF has 
been one of the most popular preconditioning tech- 
niques used by researchers working with conjugate- 
gradient methods. The present work has shown that 
the LUSSOR technique can match BILUF in stabil- 
ity and effectiveness, and is considerably more effi- 
cient in terms of the overhead costs associated with 
the preconditioning effort. 

One of the major issues raised in context with 
methods which require sub-iterations to solve lin- 
ear systems is the stopping criteria used to termi- 
nate the sub-iterations. The development of a uni- 
formly applicable stopping criteria for a variety of 
flow phenomena represents an important step in us- 
ing preconditioned GMRES as a universal solver. 
A method for selection of a stopping criteria is ob- 
tained in this research. The criteria is based on the 
Courant number of the global iteration, and can be 
easily automated for different problems. The crite- 
ria is based on empirical evidence, but works well 
for the incompressible flow test case in this research 
(the same criteria also worked equally well for com- 
pressible flow problems of reference 9). 

The success of the preconditioned GMRES 
solver in accelerating convergence for low-speed 
flows has been clearly demonstrated. The LUS- 
SOR scheme has been identified as a computation- 
ally cheap, yet effective pieconditioner. A uniform 
stopping criteria for the GMRES solver has been de- 
veloped. All the major results of this research, when 
combined with the results of reference 9, serve as a 
significant step towards the development of a univer- 
sal flow solver for compressible and incompressible 
flow problems. 


10 



Acknowle dgem ents 


This work was sponsored as part of the Grad- 
uate Research Program in Aeronautics at NASA 
Lewis Research Center. The authors are indebted 
to Drs. M. Goldstein and L. A. Povinelli for their 
support of this research. 

References 

1. Thomas, J. L., Van Leer, B., and Walters, 
R. W., “Implicit Flux-Split Schemes of the 
Euler Equations,” AIAA Paper 85-1680, July 
1985. 

2. Beam, R. M., and Warming, R. F., “An 
Implicit Factored Scheme for the Compress- 
ible Navier-Stokes Equations,” AIAA Journal , 
Vol. 16, No. 4, April 1978, pp. 393-402. 

3. Ajmani, K., “Preconditioned Conjugate Gradi- 
ent Methods for the Navier- Stokes Equations,” 
Ph.D Dissertation, Virginia Polytechnic Insti- 
tute and State University, Blacksburg, VA, 
1991. 

4. Hestenes, M. R., and Steifel, E., “Methods of 
Conjugate Gradients for Solving Linear Sys- 
tems,” J. Res. Nat. Bur. Stand., Vol. 49, 1952, 
pp. 409-436. 

5. Wong, Y. S., and Hafez, M. M., “A Mini- 
mal Residual Method for Transonic Potential 
Flow,” ICASE Report 81-15, June 1982. 

6. Wigton, L. B., Yu, N. J., and Young, D. P., 
“GMRES Acceleration of Computational Fluid 
Dynamics Codes,” AIAA Paper 85-1494, Jan- 
uary 1985 (also in AIAA Journal , Vol. 29, 
No. 7, July 1991, pp. 1092-1100). 

7. Saad, Y., and Schultz, M. H., “GMRES: A 
Generalized Minimal Residual Algorithm for 
Solving Nonsymmetric Linear Systems,” SIAM 
Journal of Scientific and Statistical Computing , 
Vol. 7, 1986, pp. 856-869. 

8. Venkatakrishnan, V., “Preconditioned Conju- 
gate Gradient Methods for the Compressible 
Navier-Stokes Equations,” AIAA Paper 90- 
0586, January 1990. 

9. Ajmani, K., Ng, W. F., and Liou, M. S., 
“Generalized Conjugate Gradient Methods for 
the Navier-Stokes Equations,” AIAA Paper 91- 
1556, June 1991. 

10. Van Leer, B., “Flux Vector Splitting for the 
Euler Equations,” Lecture Notes in Physics , 
Vol. 170, 1982, pp. 507-512 (also ICASE Re- 
port 82-30, September 1982). 

11. Reid, J. K., “On the Method of Conjugate Gra- 
dients for the Solution of Large Sparse Lin- 
ear Equations,” in Large Sparse Sets of Linear 


Equations , Academic Press, New York, 1971, 
pp. 231-254. 

12. Young, D., and Jea, K. C., “On the Simplifica- 
tion of Generalized Conjugate Gradient Meth- 
ods for Nonsymmetrizable Linear Systems,” 
Linear Algebra and hs Applications , Vol. 52, 
1983, pp. 399-471. 

13. Axelsson, O., “Conjugate Gradient Type Meth- 
ods for Unsymmetric and Inconsistent Systems 
of Linear Equations,” Linear Algebra and its 
Applications , Vol. 29, 1980, pp. 1-16. 

14. Arnoldi, W. E., “The Principle of Minimized 
Iteration in the Solution of the Matrix Eigen 
value Problem,” Quart. Appl. Math.. Vol. 9, 
1951, pp. 17-29. 

15. Meijerink, J. A., and Van der Vorst, II. A., 
“Guidelines for Usage of Incomplete Decompo- 
sitions in Solving Sets of Linear Equations as 
they occur in Practical Problems,” Journal of 
Computational Physics , Vol. 44, 1981, pp. 134— 
155. 

16. Yoon, S., and Jameson, A., “An LQ-SSOR 
Scheme for the Euler and Navier-Stokes Equa- 
tions,” AIAA Paper 87-0600, January 1987. 

17. Madsen, N. K., Rodrigue, G. IT., and Karush, 
J. I., “Matrix Multiplication by Diagonals on 
a Vector/Parallel Processor,” Info. Proc. Let- 
ters , Vol. 5, 1976, pp. 41-55. 

18. Armaly, B. F., Durst, F., Pereira, J. C. F., 
and Schonung, B., “Experimental and Theo- 
retical Investigation of Backward Facing Step 
Flow,” Journal of Fluid Mechanics , Vol. 127, 
1983, pp. 473-496. 

19. Ng, W. F., Mitchell, C. R., Ajmani, IT., Tay- 
lor, A. C., and Brock, J. S., “Viscous Analysis 
of High Speed Flows Using an Upwind Finite 
Volume Technique,” AIAA Paper 89-0001, Jan- 
uary 1989. 

20. Meurant, G., “Practical Use of the Conjugate 
Gradient Method on Parallel Supercomputers,” 
Computer Physics Communications , Vol. 53, 
1989, pp. 467-477. 

21. Giles, M. B., “Non-reflecting Boundary Condi- 
tions for Euler Equation Calculations,” AIAA 
Paper 89-1942, June 1989. 

22. Liou, M-S, and Steffen, C. J., Jr., “A New Flux 
Splitting Scheme,” Journal of Computational 
Physics , to appear. 

23. Saad, Y., “Variations on ArnoldPs Method foT 
Computing Eigenelements of Large Unsymmet- 
ric Matrices,” Linear Algebra and its Applica- 
tions , Vol. 34, 1980, pp. 269-295. 


11 



REPORT DOCUMENTATION PAGE 


Form Approved 
OMB No 0704-0188 


Public reporting burden for this collection of information Is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, 
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this 
collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for information Operations and Reports, 1215 Jefferson 
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503 


1. AGENCY USE ONLY [Leave blank) 


4. TITLE AND SUBTITLE 


2. REPORT DATE 


January 1993 


3. REPORT TYPE AND DATES COVERED 

Technical Memorandum 


Preconditioned Conjugate-Gradient Methods for Low-Speed Flow Calculations 


6. AUTHOR(S) 


Kumud Ajmani, Wing-Fai Ng, and Meng-Sing Liou 


5. FUNDING NUMBERS 


WU -505-62-21 


7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 

National Aeronautics and Space Administration 
Lewis Research Center 
Cleveland, Ohio 44135-3191 


8. PERFORMING ORGANIZATION 
REPORT NUMBER 


E-7438 


9. SPONSORING/MONITORING AGENCY NAMES(S) AND ADDRESS(ES) 

National Aeronautics and Space Administration 
Washington, D.C. 20546-0001 


10. SPONSORING/MONITORING 
AGENCY REPORT NUMBER 


NASA TM- 105929 



11. SUPPLEMENTARY NOTES 

Prepared for the 31st Aerospace Sciences Meeting and Exhibit sponsored by the American Institute of Aeronautics and Astronautics, Reno, Nevada, 

January 1 1-14, 1993. Kumud Ajmani, Institute for Computational Mechanics in Propulsion, NASA Lewis Research Center (work funded under Space Act 
Agreement C99066G); Wing-Fai Ng, Department of Mechanical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 ; 
Meng-Sing Liou, NASA Lewis Research Center. Space Act Monitor, Louis A. Povinelli. 


12a. DISTRIBUTION/AVAILABILITY STATEMENT 

Unclassified - Unlimited 
Subject Category 34 


13. ABSTRACT (Maximum 200 words) 

An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative 
solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a 
backward-facing step is chosen as a representative test problem. The unsteady form of the two-dimensional, 
compressible Navier-Stokes equations is integrated in time using discrete time-steps. The Navier-Stokes equations are 
cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear 
systems of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to 
enhance the stability and convergence rate of the solver, and are found to be critical to the overall success of the solver. 
A study of various preconditioners reveals that a preconditioner based on the Lower-Upper Successive Symmetric Over- 
Relaxation iterative scheme is more efficient than a preconditioner based on Incomplete L-U factorizations of the 
iteration matrix. The performance of the new preconditioned solver is compared with a conventional Line Gauss- Seidel 
Relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady- 
state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new 
preconditioned solver, when compared with the LGSR solver. 


14. SUBJECT TERMS 


Navier-Stokes equations; Linear system solvers; Preconditioning; Conjugate gradients; 
Convergence acceleration 


17. SECURITY CLASSIFICATION 
OF REPORT 

Unclassified 


18. SECURITY CLASSIFICATION 
OF THIS PAGE 

Unclassified 


19. SECURITY CLASSIFICATION 
OF ABSTRACT 

Unclassified 


15. NUMBER OF PAGES 

12 


16. PRICE CODE 

A03 


20. LIMITATION OF ABSTRACT 


NSN 7540-01 -280-5500 


Standard Form 298 (Rev. 2-89) 
Prescribed by ANSI Std. Z39-1B 
298-102 


























